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Abstract 

We want to find the optimal strategy for displaying advertisements e.g. 
banners, videos, in given locations at given times under some realistic dy- 
namic constraints. Our primary goal is to maximize the expected revenue 
in a given period of time, i.e. the total profit produced by the impressions, 
which depends on profit-generating events such as the impressions them- 
selves, the ensuing clicks and registrations. Moreover we must take into 
consideration the possibility that the constraints could change in time in 
a way that cannot always be foreseen. 

Keywords: web advertisement, linear programming, data mining, ma- 
chine learning. 

1 Introduction 

We want to find the optimal strategy for displaying (delivering) advertisements 
( "creatives" ) in order to achieve different goals (maximum total profit, max- 
imum average profit per advertisement, maximum visibility of the campaign) 
under some realistic constraints. 

More specifically we need to find the optimal number of "impressions" (dis- 
play of advertisements at a given time and at a given "location") under some 
realistic dynamic constraints that both limit the possibility of certain creatives 
and limit the number of impressions in certain locations and/or moments in 
time. A location can represent a place where an advertisement can be dis- 
played, ft could also include information on the user or user's category (it could 
be a combination of a location and user category location and a user, location 
and a set of keywords inserted into the web-page, etc. . . ). Hence the model we 
are considering can be used to target users by their categories. Similar optimiza- 
tion problems have already been treated in the scientific literature f |LNA + 99 , 



We are given certain "creatives" (advertisements to be displayed, e.g. ban- 
ners, videos, etc.), "campaigns" (sets of related creatives), certain "locations" 
and a period of time (set of "time frames" ) . At a given moment in time we have 
an expected profit for each creative of a given campaign in a given time frame 
and location. Usually the profit generated by 1000 impressions, called "eCPM" 
(effective cost per mille) is considered because the profit of one impression is 
very small. For the purpose of this paper we will only consider the profit of a 
single impression. 

The profit of the web-page's owner depends on the profit-generating events 
that have been agreed upon by the advertiser and the web-page's owner. These 
events can be the impression itself, a click on the advertisement or a registration 
of any sort (e.g. registration into the advertised site, purchase of the advertised 
item, etc. . . ), or any combinations of these events. 

We denote the expected profit of a single "impression" as the "impression 
profit". The impression profit is therefore the sum of the profits obtained by 
all the profit-generating events such as impressions themselves, their ensuing 
clicks and registrations of all types ("steps"). In such a way we can avoid 
keeping track of click-through rates and different registration rates. This choice 
is a compromise between performance and generality, since it makes our model 
less precise and slightly less general: we are not considering campaigns with 
separate budgets for different events; we cannot estimate the expected profit 
of an impression as precisely as when different rates for different events are 
considered. 

The number of impressions ( "supply" ) on a given location at a given time is 
limited by the traffic of the corresponding webpage. It also depends on time in 
a way that can be only partially predicted. Moreover the maximum profit for a 
given campaign ("demand") could be limited by a predefined budget. 

Our primary goal is to maximize our expected revenue which is given by 
the expected total price paid. A secondary goal is to maximize the profit of 
a single impression, i.e. obtain the maximum revenue with the minimum effort 
(minimum number of impressions). Therefore we wish to maximize a weighted 
sum of all all expected profits obtained in all locations in the period of time 
under consideration. An additional goal which is considered in the constraints 
is to maximize the visibility of the campaigns. 

Taking into account supply and demand constraints makes our model a 
special instance of a "transportation problem" for which very efficient solutions 
exist (see |Dan63] and |BBG77j ). 

The complexity of the model brings up the additional problem of deciding 
between simplifying the model and considering smaller problems, i.e. optimizing 
more locally. 

In order to apply our optimization we need to make a projection of the future 
supply and a projection of the impression profits onto our period. Impressions 
are only possible on the points allowed by the scheduling of the campaigns. 

The projection of the impression profits should also try to "guess" how the 
profit of an impression changes in time. 

The projection algorithms should take into account different periodicities 
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(daily, weekly, etc. . . ). More precise projections can be achieved through ma- 
chine learning techniques in that the weights of different periodicities are com- 
puted during a training phase, possibly with different strategies for different 
nodes or other features (time of the day, day of the week, etc. . . ). 

Moreover we cannot assume the immutability of the constraints of the prob- 
lem in the period of time under consideration. For this reason we have to decide 
how globally or locally we want to optimize the problem and we have to con- 
tinuously readjust to new constraints, new expected impression profits and new 
expected supply. 

2 Notation 

We denote by Cj the i-th campaign (set of creatives) and by B iy j its j-th creative, 
by Li the l-th location, by T/~ the k-th time frame. We denote by Xij t k,l the 
("impression value") number of impressions of Bij at time frame Tk and at 
location L\. 

We denote by Pi,j,k,l the "impression profit" , the profit of one single impression 
of creative Bi^ t k,i at position Lj and a time TV 

For the sake of simplicity we will be omitting indices in our expression whenever 
our objects do not depend on them. 

3 Configurations 

We can consider our problem as the problem of finding the optimal impression 
values for the entries in a tridimensional matrix, i.e. for all points in a tridi- 
mensional discrete finite space given by a grid defined by couples (campaign, 
creative), time and location. 

We refer to a single point in this tridimensional discrete finite space as an 
"impression-event" (or simply an "impression" when this is clear from the con- 
text). We call any choice for the values of all the impression-events as a "configu- 
ration" . An impression is in fact characterized by a couple (campaign, creative) , 
location and time. Our goal is to choose the optimal delivery of each possible 
impression, i.e. an optimal configuration. We will simply refer to the the number 
of impressions of an impression-event as the "impression value" or its "value" . 

Moreover some of our constraints restrict the possible points in such a grid. 
We can see these restriction as an additional trivial constraints of the form 
x i,j,kd — on such points. We refer to the points that do not contradict any 
constraints as "possible points" , points that are not forced to have a zero value 
by the constraints. A subset of the possible points is the set of points that 
are allowed by the schedule of the campaigns. We refer to such points as to 
"admissible points" . 

Each admissible point in this space describes a dimension of our optimization 
problem. The worst case in our problem is produced when all points inside the 
cube of size given by the number of couples (campaign, creative), the number 
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of time frames, the number of locations, are admissible. Our problem lives in 
a space whose number of dimensions is given by the number of all admissible 
points. Hence in the worst case the number of dimensions is the product of the 
number of couples (campaign, creative), the number of locations and the number 
of time frames considered. 

Remark 1. In practise we do not know the supply and cannot decide in advance 
how many impressions of a given creative should be delivered. Thus we need to 
translate the number of impressions Xij t k,l * n terms of probability of delivery. 
We transform a configuration into a map that associates a couple (k, I) with the 
probability of delivery of Bij for all possible corresponding couples 

Given a set of i-uplcs C, we introduce the following notation for subsets of 
t — 1-uples: 

C[i a] = {(xi, . . .,Xi-i,x i+ i,. ..,x t ) 

\(xi, . . . ,Xi-i,a,x i+ i, . . . ,x t ) e C}; 

C[i -> *] = {(xi, . . .,Xi-i,x i+ i,. ..,x t ) 

\3v\(xi, . . .,Xi-i,v,x i+ i, ...,x t ) e C}. 

i.e. we are considering respectively 

* (t — l)-uples obtained from tuples in C, where i-th component is a, in 
which the i-th component has been removed, and 

• (t — l)-uplcs obtained from i-uples in C where the i-th component has 
been removed independently of its value. 

In the same way, if more components are removed in parallel, we introduce 
the notation: C[i\ — > a\, . . . , i n — > a n ] for (t — n)-uples, where aj E N U {*} 
and ij G N for j e {1, . . . , n}. 

Pictorially, we could see a single configuration C = (xij t k,l)i,j,k,l as a tridi- 
mensional matrix: 

time frame 
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4 Realistic Model 



We want to consider a realistic model in which several constraints of different 
nature are taken into account. 

4.1 The Constraints 

We distinguish between the primary (physical) constraints of the problem, the 
secondary ones (commercial and optional) and the learning constraints (required 
by the learning phase if it is included in the mathematical model) . 

4.2 Primary constraints 

The primary constraints are given by the desired scheduling of the campaigns, 
(the impossibility of certain creatives at certain times and locations), by the 
limited supply of impressions and by a (possibly) limited demand (campaign's 
budget): 

1. The scheduling of the campaigns limits the admissible points: certain 
creatives Bi j are only possible at certain time frames and locations. Typ- 
ically a campaign (set of creatives) begins and ends at certain times and 
its creatives are limited to certain locations, hours of the day, days of the 
week, etc. . . 

2. Any location at a given time receives a limited supply of impressions ("lo- 
cation supply"), which solely depends on the traffic of its page (more 
precisely on the portion of traffic given to the ad-serving optimizer); 

3. For any given campaign a given total profit may not be exceeded ("cam- 
paign demand") because only a finite campaign budget can be available. 

Remark 2. A campaign can have an unbounded budget, e.g. a campaign that 
only pays the client for an actual purchase (highest step of a registration). 

For the sake of simplicity from now on we will assume implicitly that the first 
primary constraint (fTJ) is always satisfied. 

4.3 Secondary constraints 

The secondary constraints may be of a commercial nature and depend on the 
conditions in the contract between the web-site's owner, the advertiser and 
possibly the ad-serving company. These constraints could be enforced, up to a 
certain extent, at real time while monitoring the delivery, although having them 
as constraints is better for the optimality of the solution. 

They are necessary to increase the visibility of a certain campaign/creative: 

1. Any given creative/campaign should not last less than a given period, e.g. 
the period in which the campaign is scheduled. We enforce this by setting 
a minimum for the number of impressions for each possible time frame. 
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2. We would like to avoid having only one creative at a given location and 
time frame when more than one choice is available. 

4.4 Learning constraints 

Moreover, if the learning phase on the performance of new creatives and new 
locations is to be included in the mathematical model, additional constraints 
should be considered. One way to force the system to learn on new creatives and 
new nodes can be a constraint of this form: for each new couple (creative, node) 
we must have a minimum number of impressions in all (or some initial) possible 
time frames. 

4.5 The Goal 

We want to maximize our expected revenue, which is given by the total profit 
received in a given configuration. 

5 A linear programming model 

Under the false but nevertheless mild hypothesis that the impression profit is 
constant with respect to its value we can assume that our constraints are linear. 
This assumption is not true in general because there is no linear dependence 
between the total profit generated by an impression-event and an impression 
value, i.e. displaying the same advertisement x times on the same node, possibly 
more than once to the same user, does not necessarily produce x times the profit 
produced by one single display. 

Since we are ultimately interested in the probability of delivery and since 
integer linear programming is computationally unfeasible (NP-hard), a possible 
approach to this problem could be real linear programming: we approximate 
our discrete problem with a continuous one and we do not mind considering a 
real number of impressions. 

5.1 Formalized constraints 

The points that do not contradict the first primary constraint will form the 
unknowns of our model. In such a way we can avoid to include inequalities for 
the first primary constraint. 

5.1.1 Primary constraints 

We do not include the first primary constraints for the reasons given above and 
assume that in our expressions all indices run over points that do not contradict 
the first primary constraints. 

Supply and demand are formalized as follows: 
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Second primary constraint: 

V;,fe ^Xij.k.i < S Lk ; (Supply) (1) 

where 5/^ is the supply at location L/ and at time Tk- 
Third primary constraint: 

^2 /'■-.'■[■■'■''■■j-^-i - (Demand) (2) 

j,l,k 

where T>i is the budget of the i-th campaign. 

5.1.2 Secondary constraints 

The secondary constraints are formalized as follows: 
First secondary constraint: 

Vj,fc y^^i,j,k,i > M»,fci (lasting) (3) 
U 

where is the desired minimum delivery of impressions of the z-th campaign 
at time Tk- 

Second secondary constraint: 

Vi,/s e -DVi,j Xij.k.i < Pi,k - Six, (no overflow) (4) 

where P^k G [0, 1] (usually close to 1) defines how much a single creative can 
occupy a location at a given time frame and where T> is the set of indices 
corresponding to locations and time frames where at least 2 different creatives 
are possible. 

Remark 3. The second secondary constraints should only be limited to those 
cases in which at a given location and time frame more than one pair of campaign 
and creative is possible because otherwise the constraint in would prevent the 
node from being filled with impressions even when this could be possible. 

5.1.3 Learning constraints 

If the learning phase is included in the model a constraint should force the new 
creatives and new locations to have a minimum number of impressions: 

Vncw j.l Vi.fe Xi j k.l ~>K,j,k,l- (5) 

We are also implicitly assuming that the unknowns are non-negative, i.e. 

Vi,j,k,l X iJ,k,l > 0. (6) 
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6 Existence of a solution 



We see that there is no guarantee of consistency once the secondary and learning 
constraints are introduced, even if we exclude the first primary constraints. In 
general we need to solve the system of inequalities in order to make sure that 
there is indeed a solution. Nevertheless if we carefully choose \i,j,k,l an d (J>i t k we 
can avoid that the first secondary or learning constraints contradict the second 
and third primary constraints. 

In particular we have the following facts 

Fact 1. If we choose 

. r D, 

Vi,k < mm{ 



|C[l^z,2^*,4^*]|M' (7] 

S k .t , 1 ' 

mm - — F Y 

teC[i-H,2->-*,3->-fe] \C[2 -> *, 3 -> k, 4 -> t] \ 

then the semi- algebraic set defined by the inequalities (QP, ^j) and (0|) is 
not empty. 

Proof. The second secondary constraint ^ and our choice allow to have 

5>ij,*,i < \c[TViJ^J^*\\M- (8) 
Therefore for any campaign Ci we have 

'^^Pi,j,k,lXi,j,k,l < 
j,k,l 

< M E Hi,k,l <M E E Xi ^ k < 1 - 



j.k.l fc<EC[l^i,2->*,4^*] 3,1 

< m y — ^ t— = a 



(9) 



Moreover we can take 

EXi.j,k,i < min 7777- f M , , -j-- (10) 

tec[i^i,2^*3^k] \C[2 -> *, 3 -> fc,4 ->• t]| 

Hence for any couple (&;, /) of location and time frame we have 

i,j 3 i k,s 

Sk 

?2^*,3^fc] \C[1 -> i, 2 ->■ *, 3 -> < 



< > mm 7—7 77 < fui 

~ ^ te c[i^i,2^*,3->fel C 1 -> i, 2 -> *, 3 -s- i ~ v iJ J 



< V ^ = S k 1. 



□ 
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Fact 2. If we choose 

D S 

Xi ^ 1 £ min{ \ C [i4m' ic[3^m^/]| } (12) 

then the semi- algebraic set defined by the inequalities |JJ), ^ @) is not 
empty. 

Proof. The learning constraint Q and our choice allow to have 

* w ' w " |C[l5i]|Af (13) 
from which it follows that for any campaign Cf. 



(14) 



.^|C[1^]|M 



Moreover we can also take 



x w < icpJkU^tW (15) 

from which it follows that for any couple (/, k) of location and time we have 

E *«.w ^ E jcF^rq] = 5m - (16) 

□ 

Theorem 1. The system of inequalities formed by second and third primary 
constraints, first secondary constraints and learning constraints has at least a 
solution. 

Proof. This follows from Fact [1] and Fact H □ 

Remark 4. A slightly simpler model in which only primary constraints (in- 
cluding the first ones) are considered is an instance of the "Hitchcock's style 
transportation problem" \F.L41^j . 

6.1 The objective function 

We want to maximize our expected revenue, which is given by the sum of all 
expected profits received in a given configuration. 

Hence we can estimate the revenue by taking the weighted sum of the ex- 
pected revenue in a given configuration C. Such a sum will be our "objective 
function" : 

F(c) = ^ Pi,j,k,l x i,j,k,l- (17) 

where Pi,j,k,l x i,j,k,i is the expected profit generated by at location L\ and 
at time Tfe. 
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7 Simplified models: transportation problems 



The standard algorithm for solving linear programming problems is the well 
known simplex algorithm. The simplex algorithm, although fast in many prac- 
tical situations, is exponential in the worst case and does not scale well enough 
for the size of problems we want to consider. Moreover no general linear pro- 
gramming algorithm is known to be strongly polynomial. 

An important speed-up can be achieved by simplifying the model we have 
considered. In particular we could consider a simpler model that could fall into 
the category of "transportation problems" . For such problems very efficient 
algorithms are known such as the "stepping stone algorithm" (see |T)an63] and 
[BBG77]) 

The classical transportation problem is a linear programming problem whose 
constraints describe demands di to be met and supplies Sj to be delivered: 

i 3 

We say that a transportation problem is "balanced" if the total demand 
equals the total supply. Taking into consideration only balanced problems is not 
a real restriction because one can always put oneself into this case by adding an 
extra dummy supply or extra dummy demand with zero cost / gain. 

In our case demands could describe a required number of creatives and sup- 
plies a the number of impressions that can be shown in given locations and time 
frames. 

A more general transportation problem goes under the name of "Hitchcock's 
style transportation problem" f |F.L41j ) that better approximates our problem, 
in that the demand is obtained by multiplying the impression values by a factor 
(representing their profit). The constraints are of the following kind: 

Yi x i,3 ~ s i > 
i 

3 

where Cij is the value (cost or profit) associated to Xij. 

8 Maximizing the value of impressions 

A related problem to the one of maximizing the revenue is that the problem 
of minimizing the number of used impressions under some constraints on the 
revenue generated by each campaign. This is equivalent to the problem of 
maximizing the profit generated by a single impression. 

In other words we want to maximize the revenue and secondly minimize 
the number of impressions such that the maximum revenue is achieved, i.e. we 
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want to maximize the average profit of impressions, provided that the maximum 
revenue is achieved. 

If we exclude secondary and learning constraints, this can be formalized as a 
special instance of the "Hitchcock's style transportation problem" (see |F.L41j ). 

Given the constraints 



i.e. the total number of impressions. 

9 Projections 

In order to apply our optimization algorithms we need to have at least a pro- 
jection of the supply and a projection of the expected profit of all impressions 
allowed by the first primary constraint. The supply and impression profits could 
be estimated by taking a proper weighted average from the historical data. The 
projection should take into account different factors: episodic factors and pos- 
sibly different periodicities (daily, weekly, yearly, etc. . . ). 

9.1 Projecting the profit 

The profit of an impression-event may depend on the periodicity of its campaign 
and of its location. Since the eCPM tends to change slowly in time, it can be 
predicted better than the supply. If we want to determine the expected profit 
for an impression-event (Bij,Tk, Li) we can take some average profit from our 
historical data on "similar" events. Our strategy is to use the most accurate 
and recent available information in the historical data. 

A simplified version of our algorithm can be described by the following pro- 
cedure (more subcases are considered by the actual algorithm): 

1. Try to find enough impression-events of the form (Bij, T, L{) where T is a 
similar time (possibly same day of the week and same hour) starting from 
the closest dates first. 

2. Try to find enough impression-events of the form (B, T, Li) and (Bij,T, L) 
and take a weighted average of the two averages, where L is any location, 
B is any creative, T is a similar time. 

3. Try to do the same as in the previous step but with (B, T, L{) and {Bi^ , T, L) 
where Bi k is a different creative belonging to the same campaign. 

4. Try to find enough impression-events (only on same the campaign, only 
the same node, etc. . . ). 



Pj,k,l%i,j,k,l = d>i 

we want to minimize the following objective function 



(19) 
(20) 




(21) 



i,j,k,l 



li 



9.2 Projecting the supply 

A model for the projection of the supply should take into consideration the 
periodicity of the location, i.e. some sites are more often visited in particular 
periods of the year, day, hours, etc. . . More periodicities may concur, e.g. a site 
may be visited more often in a specific day of the week and at a specific hour of 
the day, and may also have an episodic surge in the number of visitors for a short 
period for some unpredictable event. A mathematical model that could describe 
the concurrent effects of different periodicities could be that of superimposing 
waves, where each wave describe a different factor, e.g. a weekly factor and a 
contingent factor. 

9.2.1 Weighted average 

In many practical cases it is enough to consider a weighted average of the sup- 
ply in the previous two weeks and at similar hours in a similar fashion as to 
procedure 19.11 used for the projection of the profit. 

9.2.2 Machine learning 

Regression analysis through machine-learning techniques such as support vec- 
tor machines can be a viable approach for the problem of properly choosing the 
weights of the average of the different "features" (e.g. periodicities). Non-linear 
kernels could also be taken into considerations if they perform significantly bet- 
ter for the data sets under consideration. 

10 Dynamic and stochastic nature of our prob- 
lem 

In reality this approach has a serious drawback: we are making a very false 
assumption because by applying linear programming we are assuming that in 
the period under consideration the expected impression profits pij.k.i, the ex- 
pected supplies Si.k, and our constraints do not change. We are also erroneously 
assuming that Pi,j,k,i is a constant with respect to Xij^,i- 

10.1 Non-linear and dynamic problem 

Our problem is in fact non-linear and dynamic. To make things worse its state 
depends on external factors that cannot always be forecast (e.g. new campaigns 
can come into play). Hence we are forced to continuously readapt to the new 
constraints. Thus we can expect a good performance if our expected impression 
profit and constraints do not change too much in the period of time under 
consideration. 
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10.2 Learning phase 

An additional problem comes from the fact that the system has to learn how new 
creatives and new locations perform. The corresponding theoretical problem 
goes under the name of "exploration-exploitation trade-off" , i.e. the challenge 
of deciding between learning how some resources perform versus exploiting the 
ones that have so far performed better. In practice we need to decide between 
displaying advertisements that help the system learn about their performance 
versus displaying those that generate a more immediate reward. This problem 
has been addressed in |AN99j where a technique based on the Gittins index has 
been used. 

10.3 How far into the future 

This also poses the problem of deciding how fast we want to update our in- 
formation and how far in time we want our optimization to "see" our problem 
(i.e. how globally we want to solve the problem). A global solution could be a 
very bad one if the conditions of the problem were to change too quickly. We 
might want to give a different weight to an expected profit far in time in order 
to compensate for possible changes and so limit the risk. 

We must also take into consideration the stochastic nature of our data. For 
example we might use historical data to extract the standard deviation for the 
profit of the impression and use it to better assess the risk. 

A possibility could be to have an adaptive or semi-automatic system in 
which the time span given to the optimizer is adaptively /manually adjusted 
when there is a high probability of a significant change in the constraints, e.g. 
a new campaign is likely going to come into play, the estimation of profit of an 
impression is not stable enough, etc. . . . 

11 Targeting users 

The approach we have so far presented optimizes the delivery of advertisements 
in both space (nodes) and time (time frames). It does not explicitly take users' 
profiles into consideration. 

Nevertheless the very same algorithms and code can be used to take into 
account users' profiles by encapsulating the profile information into the node 
information. Therefore we should simply store a pair (node, profile) into a 
single "extended node" . The result would be that 

• the supply are projected onto triples (node, prof He, time); 

• the eCPM and the delivery are computed for quintuples 
(campaign, creative, node, profile, time). 

No modification of the code is necessary. 
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12 Computational considerations 



The large number of unknowns and constraints in this general approach can 
pose a serious problem to its computable feasibility. 

12.1 Reducing dimensions and constraints 

We could reduce the number of dimensions by clustering similar attributes, i.e. 
combinations of locations and time frames (see [XN99 for an approach to this 
problem) or by simplifying our model. For example we could simplify our model 
as follows: 

• We could restrict our optimization problem to periods of time in which 
the time constraints do not change. This greatly reduces the number of 
unknowns but could also brings us to suboptimal solutions. 

• We could avoid considering the secondary and learning constraints within 
the model and have them enforced during the delivery. 

• We can use a time horizon, beyond which all the time frames are considered 
jointly. 

• We could use similar attributes as one single attribute, e.g. an impression 
at location Ai at time t\ could perform similarly to a location A2 at some 
other time £2- 

• Projecting the "impression profits" is a costly operation because of the 
sheer number of points to be considered. This operation can be sped up 
by assuming that similar points produce the same profit. 

12.2 Other optimization algorithms 

The simplex algorithm is not the only known efficient algorithm for linear pro- 
gramming. Interior-point algorithms provide a valid alternative and are poly- 
nomial (more specifically "weakly polynomial''^. Totally different approaches to 
the optimization could be possible, e.g. gradient based, genetic, etc. . . Unfortunately 
these other approaches are suboptimal because they are intrinsically local. More- 
over they do not exploit the linear nature of the constraints and of the objective 
function. 

1 An algorithm is "strongly polynomial" if and only if 

• the number of operations in the arithmetic model of computation is bounded by a 
polynomial in the number of integers in the input instance; and 

• the space used by the algorithm is bounded by a polynomial in the size of the input. 

An algorithm which runs in polynomial time but which is not strongly polynomial is said to 
run in weakly polynomial time. The existence of a strongly polynomial algorithm for real 
linear programming is still an open problem. 
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12.3 Tuning the supply projection 

Properly projecting the supply from historical data can be a hard task due to 
the fact that the available historical data might not correspond to the real traffic 
but only to a possible variable portion of the real traffic which is given to the ad- 
server. This problem might be impossible if no regularity is present in the data. 
A machine learning approach may better tackle it than taking a simple weighted 
average of selected profits at some previous periods with constant weights. For 
instance this is the case if an optimal method is non-linear. 

13 Benchmarking linear solvers for ad-serving 
problems 

In order to test the computational feasibility of two of the main linear program- 
ming solvers on our problems we have used the free and open-source lp_solve 
|lps| and glpk glp libraries. They both come as C libraries; both have wrap- 
per interfaces in higher level languages such as Java. The result was a clear 
win for glpk at least for the type of problems under our consideration. Other 
alternative libraries are bpmpd [Mes , soplex [Wun , pcx [JSMS . 

14 Software Implementation 

We have implemented an ad-server optimizer in both C and in Java (see [CGj V 
We have used the glpk library for solving the mathematical model and libsvm 
(see |CCLj ) for automatically learning how to project future supply (Internet 
traffic). 

14.1 Features 

Our implementation has the following features: 

• Constraints: We consider in our model the three types of primary con- 
straints and provide as option the secondary and learning constraints. 

• Projection of the profit: We perform a projection of the impression 
profits from historical data, by taking into account different periodicities 
(e.g. hourly, daily). 

• Projection of the supply: We perform a projection of the supply from 
historical data, by both machine learning (support vector machine libsvm 
and/or by a fixed weighted average which both take into account different 
periodicities. 

• Time horizon: We can set a time horizon in order to reduce the number 
of unknowns. 
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14.2 Extracting the data from the database 

Our implementation requires as input: 

1. historical data necessary for projecting the impression profits and the fu- 
ture supply, 

2. campaign data (budgets for each campaign), 

3. scheduling data (set of possible impression not contradicting the first pri- 
mary constraints). 

14.3 Main steps of the optimization 

The algorithm can be roughly subdivided in the following macro steps: 

1. Historical data is read. 

2. The past supply is extracted from the historical data. 

3. The future impression profits are projected from the historical data. 

4. The future supply is projected from the past supply. 

5. The mathematical model for the optimization problem is constructed. 

6. The model is solved. 

7. The solution of the problem is translated in terms of probabilities of de- 
livery. 

8. The delivery probability is only used on the very next time frame. 

9. This procedure is repeated on the next time frame. 

15 Results on real data provided by Neodata 

Our prototype has been used on some real data used at Neodata and has been 
compared against the results produced by the optimizer currently used at Neo- 
data, which uses a simple greedy algorithm: 

1. after a learning phase; 

2. if a campaign is achieving its target at the current rate, nothing is done, 
otherwise, the campaign is stopped in its less profit-generating nodes. 

The data we used were the logs and schedules creatives used by two clients 
of Neodata, which, we call A and B. We have considered the data of April 
2010 for both companies. We must remark that the percentage of traffic that is 
managed by Neodata, neither is the total traffic nor is it a constant percentage of 
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the traffic generated by the sites under consideration. This makes the problem 
of properly estimating the supply much harder (or even impossible). 

The prototype achieved the following results: A was optimized equally well 
by the current optimizer and our prototype; whereas B was optimized better 
by a large margin (more then 20%) by our code. We do not know for sure why 
the data on A are not optimized equally well. Possible reasons are: there is no 
room for further improvement, the data on the supply cannot be used for the 
projection because it does no correspond to a constant percentage of the real 
traffic. 

The data was used as follows: the initial portion of the month (e.g. the first 
20 days) were used for training the system, i.e. projecting the supply (traffic) 
and the profits. The remaining part of the month was used as a schedule and 
was optimized. 



16 Conclusion 

Our prototype has shown that real data can be indeed optimized better than 
what a greedy algorithm does. There are still some open issues: how to correctly 
project the supply when the conditions of the problem change quickly and the 
data does not correspond to a constant percentage of the traffic. 
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